Ground Truth for Grammaticality Correction Metrics

نویسندگان

  • Courtney Napoles
  • Keisuke Sakaguchi
  • Matt Post
  • Joel R. Tetreault
چکیده

How do we know which grammatical error correction (GEC) system is best? A number of metrics have been proposed over the years, each motivated by weaknesses of previous metrics; however, the metrics themselves have not been compared to an empirical gold standard grounded in human judgments. We conducted the first human evaluation of GEC system outputs, and show that the rankings produced by metrics such as MaxMatch and I-measure do not correlate well with this ground truth. As a step towards better metrics, we also propose GLEU, a simple variant of BLEU, modified to account for both the source and the reference, and show that it hews much more closely to human judgments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems

In grammatical error correction (GEC), automatically evaluating system outputs requires gold-standard references, which must be created manually and thus tend to be both expensive and limited in coverage. To address this problem, a referenceless approach has recently emerged; however, previous reference-less metrics that only consider the criterion of grammaticality, have not worked as well as ...

متن کامل

There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction

Current methods for automatically evaluating grammatical error correction (GEC) systems rely on gold-standard references. However, these methods suffer from penalizing grammatical edits that are correct but not in the gold standard. We show that reference-less grammaticality metrics correlate very strongly with human judgments and are competitive with the leading reference-based evaluation metr...

متن کامل

Development, importance, and effect of a ground truth correction for the Moon Mineralogy Mapper reflectance data set

[1] We evaluate the effect and importance of a ground truth correction for the Moon Mineralogy Mapper (M) level 2 (reflectance) data set. This correction is derived from extensive laboratory characterizations of mature feldspathic lunar soils and is designed to improve the accuracy of 1mm absorption features in M reflectance data. To evaluate the correction, the band strength across a subset of...

متن کامل

Semi-supervised learning of deep metrics for stereo reconstruction

Deep-learning metrics have recently demonstrated extremely good performance to match image patches for stereo reconstruction. However, training such metrics requires large amount of labeled stereo images, which can be difficult or costly to collect for certain applications. The main contribution of our work is a new semisupervised method for learning deep metrics from unlabeled stereo images, g...

متن کامل

Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing

Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given doc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015